refactor: contain image converter file paths within root_path#11229
Open
etairl wants to merge 2 commits intodeepset-ai:mainfrom
Open
refactor: contain image converter file paths within root_path#11229etairl wants to merge 2 commits intodeepset-ai:mainfrom
etairl wants to merge 2 commits intodeepset-ai:mainfrom
Conversation
_extract_image_sources_info joined a Document's caller-supplied ``file_path`` metadata with ``root_path`` via ``Path(root_path, file_path)`` without any containment check. Absolute paths in metadata short-circuit the join and ``..`` segments resolve outside the directory, so any caller that propagates document metadata from less-trusted sources can end up reading files outside the configured root. Reject absolute file_path values up-front, resolve the joined path, and verify the result is the configured root or a descendant of it. Raise ValueError with a clear message when these checks fail. Behavior for documents with relative paths inside root_path is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@etairl is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
Member
|
See #11226 (review) |
Release-note linter rejects single backticks for inline code. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_extract_image_sources_info(inhaystack/components/converters/image/image_utils.py) was joining aDocument'sfile_pathmetadata field withroot_pathviaPath(root_path, file_path)and then callingopen()on the result.Path('/data', '/etc/passwd') == Path('/etc/passwd')) and../segments resolve outside the directory. Any caller that propagates document metadata from less-trusted sources therefore ends up reading files outside the configured root.root_pathor a descendant of it before opening the file.Used by, among others,
LLMDocumentContentExtractorandDocumentToImageContent. The behavior for documents whosefile_pathis a normal relative path insideroot_pathis unchanged.Test plan
DocumentToImageContent/LLMDocumentContentExtractorunit tests.meta['file_path'] = 'subdir/image.png'resolves and loads as before.meta['file_path'] = '/etc/passwd'raisesValueError(absolute path).meta['file_path'] = '../../../etc/passwd'raisesValueError(escapes root).🤖 Generated with Claude Code